Even though the USA is a first world country, 40 million Americans, including 12 million children, are “food insecure.” The U.S. Department of Agriculture (USDA), defines food insecurity as an “economic and social condition of limited or uncertain access to adequate food.” Exactly what this definition means is matter of metrics and their interpretation. Because food insecurity may be affecting different demographics differently, we should measure and understand those differences. Food insecurity can lead to negative consequences for individuals’ health and economic well being. What those consequences are and their severity can also be measured. If food insecurity is weighing down a significant part of our society, and if we choose to do something about it, we should try to understand the factors that may be related to or may be predictors of food insecurity. If we design and implement policies and programs to address food insecurity, we should be able to measure their impact on the problem.
The USDA collects and maintains data related to issues outlined above in a “Food Environment Atlas.” The fundamental measure of food insecurity comes from an annual household survey. The survey has questions to assess different experiences and behaviors that indicate food insecurity, such as being unable to afford balanced meals, cutting the size of meals because of too little money for food, or being hungry because of too little money for food. The food security status of a household is designated as secure, insecure, or very insecure based on the number of food-insecure conditions reported. This data is aggregated to the state level in the Atlas.
A hierarchy of factors are believed to be causes of food insecurity. These factors are availability (of food sources in a geographic region), access (to the food sources by the population), and utilization (of food sources by the population). The Atlas maintains three broad categories of data:
Food Choices—Indicators of the community's access to and acquisition of healthy, affordable food, such as: access and proximity to a grocery store; number of food stores and restaurants; expenditures on fast foods; food and nutrition assistance program participation; food prices; food taxes; and availability of local foods.
Health and Well-Being—Indicators of the community's success in maintaining healthy diets, such as: food insecurity; diabetes and obesity rates; and physical activity levels.
Community Characteristics—Indicators of community characteristics that might influence the food environment, such as: demographic composition; income and poverty; population loss; metro-nonmetro status; natural amenities; and recreation and fitness centers.
These add up to over 275 indicators of the food environment collected at the county or the state level. This data can be processed and analyzed to create maps showing variation of individual indicators across the country, create charts showing variation in a single indicator over time, and correlation between individual factors. Additional data related to pilot or new programs can potentially be combined with the Atlas data to judge the effectiveness of those programs.
The purpose of this project is to examine the factors affecting food insecurity on a county and state wide level throughout the US. We will get our data from the Food Environment Atlas (located at https://www.ers.usda.gov/data-products/food-environment-atlas.aspx) and use data processing techniques to find the relationships between several variables, including food access, food availability, assistance, prices, local foods, demographics, insecurity, and health.
For the data collection, you will need the following modules:
import numpy as np
import datetime as dt
import pandas as pd
from math import sqrt
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets, linear_model
import statistics as stat
import xlrd
import json
import vincent
from IPython.display import Image
# vincent.core.initialize_notebook()
# world_topo = r'world-countries.topo.json'
# state_topo = r'us_states.topo.json'
# lake_topo = r'lakes_50m.topo.json'
# county_geo = r'us_counties.geo.json'
# county_topo = r'us_counties.topo.json'
# or_topo = r'or_counties.topo.json'
Due to technical difficulties, it was not possible to export the vincent code properly to html, so it had to be commented out and a workaround was found. The next section goes through parsing and cleaning the data. Below represents the code to read the file and find the sheet names.
xl = pd.ExcelFile("DataDownload.xls")
xl.sheet_names
Each sheet is individually parsed and added to its own dataframe. A dataframe is a special data structure designed for data processing where each entry is an individual instance and contains all possible pieces of data for that instance. In this case, each entry represents a single county by FIPS code.
df_data = xl.parse("Supplemental Data - County")
df_acc = xl.parse("ACCESS")
df_acc = df_acc.drop(['State', 'County'], axis = 1)
df_sto = xl.parse("STORES")
df_sto = df_sto.drop(['State', 'County'], axis = 1)
df_res = xl.parse("RESTAURANTS")
df_res = df_res.drop(['State', 'County'], axis = 1)
df_ass = xl.parse("ASSISTANCE")
df_ass = df_ass.drop(['State', 'County'], axis = 1)
df_ins = xl.parse("INSECURITY")
df_ins = df_ins.drop(['State', 'County'], axis = 1)
df_pri = xl.parse("PRICES_TAXES")
df_pri = df_pri.drop(['State', 'County'], axis = 1)
df_loc = xl.parse("LOCAL")
df_loc = df_loc.drop(['State', 'County'], axis = 1)
df_hea = xl.parse("HEALTH")
df_hea = df_hea.drop(['State', 'County'], axis = 1)
df_soc = xl.parse("SOCIOECONOMIC")
df_soc = df_soc.drop(['State', 'County'], axis = 1)
Below is more formatting to make the data easier to work with, and then finally combining the dataframes for each sheet into one large dataframe.
df = df_data.copy()
df = df.rename(columns = {'FIPS ': 'FIPS', 'State': 'State_temp', 'County': 'County_temp'})
df = df.merge(df_acc, how = 'inner', on = 'FIPS')
df = df.merge(df_sto, how = 'inner', on = 'FIPS')
df = df.merge(df_res, how = 'inner', on = 'FIPS')
df = df.merge(df_ass, how = 'inner', on = 'FIPS')
df = df.merge(df_ins, how = 'inner', on = 'FIPS')
df = df.merge(df_pri, how = 'inner', on = 'FIPS')
df = df.merge(df_loc, how = 'inner', on = 'FIPS')
df = df.merge(df_hea, how = 'inner', on = 'FIPS')
df = df.merge(df_soc, how = 'inner', on = 'FIPS')
df = df.rename(columns = {'State_temp': 'State', 'County_temp': 'County'})
temp = list(df['FIPS'])
for i in range(len(temp)):
temp[i] = str(temp[i])
if len(temp[i]) < 5:
temp[i] = '0' + temp[i]
df['FIPS'] = temp
temp = list(df['2010 Census Population'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['2010 Census Population'] = temp
temp = list(df['Population Estimate, 2011'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2011'] = temp
temp = list(df['Population Estimate, 2012'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2012'] = temp
temp = list(df['Population Estimate, 2013'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2013'] = temp
temp = list(df['Population Estimate, 2014'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2014'] = temp
temp = list(df['Population Estimate, 2015'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2015'] = temp
temp = list(df['Population Estimate, 2016'])
for i in range(len(temp)):
temp[i] = int(temp[i].replace(',',''))
df['Population Estimate, 2016'] = temp
This is a snapshot of the entire dataframe.
df
list(df.columns)
After formatting the column names as strings, all formatting is done, and the data processing and visualization can begin.
m = list(df.columns)
for i in range(len(m)):
m[i] = str(m[i])
df = df[m]
df
A new normalized dataframe is created in preparation for the covariance matrix, since a covariance of two normalized variables is just the correlation coefficient, which is much easier to understand.
norm_df = df.copy()
norm_factors = {}
for st in norm_df.columns:
if st == 'FIPS' or st == 'State' or st == 'County':
continue
temp = list(norm_df[st])
total = len(temp)
mn = 0.0
for i in range(0, len(temp)):
if np.isnan(temp[i]):
total -= 1
else:
mn += temp[i]
mn /= total
st_dev = 0.0
for i in range(len(temp)):
if not np.isnan(temp[i]):
st_dev += ((mn - temp[i]) ** 2)
st_dev /= total
st_dev = sqrt(st_dev)
if(st_dev < 0.000001):
st_dev = 1.0
norm_factors[st] = [mn, st_dev]
for i in range(len(temp)):
if not np.isnan(temp[i]):
temp[i] = (temp[i] - mn) / st_dev
norm_df[st] = temp
norm_df
We choose 8 important variables from this dataframe: median household income in 2015, poverty rate in 2015, percentage with low access to food in 2015, grocery stores per 1000 people in 2014, percentage of SNAP participation in 2016, percentage of WIC participation in 2016, farmers markets per 1000 people in 2016, and finally the dependent variable: percentage of food insecurity from 2013-2015. Below are the heatmaps of each of these variables by county and a description of the variables.
Indicator: Median household income
Geographic level: County
Definition: Median income by household: an income level that divides county households in half, one half with income above the median and the other half with income below the median; includes income of all household members 15 years old or older.
Data sources: USDA's Economic Research Service, Atlas of Rural and Small-Town America, using data from the U.S. Census Bureau, Small Area Income and Poverty Estimates.
Available year: 2015
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='MEDHHINC15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 20000, 30000, 40000, 50000, 70000, 100000, 150000]
# vis.legend(title='Median Household Income 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='MEDHHINC15.png')
Indicator: Poverty rate
Geographic level: County
Definition: Percent of the county population living in families with income below the poverty threshold; poverty status thresholds vary by family size, number of children, and age of householder. If a family's pre-tax money income is less than the dollar value of their threshold, then that family and every individual in it are considered to be poor. For people not living in families, poverty status is determined by comparing the individual's income to his or her poverty threshold.
Data sources: USDA's Economic Research Service, Atlas of Rural and Small-Town America, using data from the U.S. Census Bureau, Small Area Income and Poverty Estimates.
Available year: 2015
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='POVRATE15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 2, 5, 10, 15, 20, 30, 50]
# vis.legend(title='Poverty Rate 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='POVRATE15.png')
Indicator: Population, low access to store (percent)
Geographic level: County
Definition: Percentage of people in a county living more than 1 mile from a supermarket or large grocery store if in an urban area, or more than 10 miles from a supermarket or large grocery store if in a rural area.
Data sources: In each of the referenced reports, a directory of supermarkets and large grocery stores authorized to accept SNAP benefits was merged with Trade Dimensions' TDLinx directory of stores within the United States, including Alaska and Hawaii, for the years 2010 and 2015. Stores met the definition of a supermarket or large grocery store if they reported at least $2 million in annual sales and contained all the major food departments found in a traditional supermarket, including fresh meat and poultry, dairy, dry and packaged foods, and frozen foods. The combined list of supermarkets and large grocery stores was converted into a GIS-usable format by geocoding the street address into store-point locations. Population data are reported at the block level from the 2010 Census of Population and Housing. These population data were aerially allocated down to ½-kilometer-square grids across the United States. For each ½-kilometer-square grid cell, the distance was calculated from its geographic center to the center of the grid cell with the nearest supermarket. Rural or urban status is designated by the Census Bureau’s Urban Area definition. Once distance to the nearest supermarket or large grocery store was calculated for each grid cell, the number of individuals living more than 1 mile from a supermarket or large grocery store in urban areas and more than 10 miles from a supermarket or large grocery store in rural areas was aggregated to the county level and divided by the total number of individuals in the county to obtain the percentage of total population in the county that resided more than 1 or 10 miles from a supermarket.
Available years: 2010 and 2015
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='PCT_LACCESS_POP15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 5, 10, 20, 30, 50, 75, 100]
# vis.legend(title='Percentage with Low Access 2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='PCT_LACCESS_POP15.png')
Indicator: Grocery stores/1,000 pop
Geographic level: County
Definition: The number of supermarkets and grocery stores in the county per 1,000 county residents.
Data sources: Store data are from the U.S. Census Bureau, County Business Patterns. Population data are from the U.S. Census Bureau, Population Estimates. Grocery stores (defined by North American Industry Classification System (NAICS) code 445110) include establishments generally known as supermarkets and smaller grocery stores primarily engaged in retailing a general line of food, such as canned and frozen foods; fresh fruits and vegetables; and fresh and prepared meats, fish, and poultry. Included in this industry are delicatessen-type establishments primarily engaged in retailing a general line of food. Convenience stores, with or without gasoline sales, are excluded. Large general merchandise stores that also retail food, such as supercenters and warehouse club stores, are excluded.
Available years: 2009 and 2014
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='GROCPTH14', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 0.1, 0.25, 0.5, 1, 2, 3, 4]
# vis.legend(title='Grocery Stores per 1000 pop. 2014')
# vis.to_json('vega.json')
# vis.display()
Image(filename='GROCPTH14.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='PCT_SNAP16', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 5, 7.5, 10, 12.5, 15, 20, 25]
# vis.legend(title='Percentage SNAP participation 2016')
# vis.to_json('vega.json')
# vis.display()
Image(filename='PCT_SNAP16.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='PCT_WIC15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 1, 1.25, 1.5, 1.75, 2, 2.5, 3]
# vis.legend(title='Percentage WIC participation 2016')
# vis.to_json('vega.json')
# vis.display()
Image(filename='PCT_WIC15.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='FMRKTPTH16', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 0.05, 0.1, 0.25, 0.5, 0.75, 1, 1.5]
# vis.legend(title='Farmers Markets per 1000 pop. 2016')
# vis.to_json('vega.json')
# vis.display()
Image(filename='FMRKTPTH16.png')
# geo_data = [{'name': 'counties',
# 'url': county_topo,
# 'feature': 'us_counties.geo'},
# {'name': 'states',
# 'url': state_topo,
# 'feature': 'us_states.geo'}
# ]
# vis = vincent.Map(data=df, geo_data=geo_data, scale=1000, projection='albersUsa',
# data_bind='FOODINSEC_13_15', data_key='FIPS', map_key={'counties': 'properties.FIPS'})
# del vis.marks[1].properties.update
# vis.marks[0].properties.update.fill.value = '#084081'
# vis.marks[1].properties.enter.stroke.value = '#fff'
# vis.marks[0].properties.enter.stroke.value = '#7bccc4'
# vis.scales['color'].type = 'threshold'
# vis.scales['color'].domain = [0, 7.5, 10, 12.5, 15, 17.5, 20, 25]
# vis.legend(title='Percentage Food Insecurity 2013-2015')
# vis.to_json('vega.json')
# vis.display()
Image(filename='FOODINSEC_13_15.png')
var_list = ['MEDHHINC15', 'POVRATE15', 'PCT_LACCESS_POP15', 'GROCPTH14', 'PCT_SNAP16', 'PCT_WIC15', 'FMRKTPTH16', 'FOODINSEC_13_15']
col_list = list(df.columns)
for name in col_list:
if name not in var_list:
del df[name]
for name in col_list:
if name not in var_list:
del norm_df[name]
for name in var_list:
df = df[np.isfinite(df[name])]
for name in var_list:
norm_df = norm_df[np.isfinite(norm_df[name])]
df = df[var_list]
norm_df = norm_df[var_list]
corr_mat = norm_df.corr()
sns.heatmap(corr_mat, cmap = 'RdYlGn', annot = True)
def transpose(l):
new_l = []
for i in range(len(l)):
new_l += [[l[i]]]
return new_l
sns.regplot(x='MEDHHINC15', y='PCT_LACCESS_POP15', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['PCT_LACCESS_POP15'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='GROCPTH14', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['GROCPTH14'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='PCT_SNAP16', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['PCT_SNAP16'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='PCT_WIC15', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['PCT_WIC15'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='FMRKTPTH16', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['FMRKTPTH16'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))
sns.regplot(x='MEDHHINC15', y='FOODINSEC_13_15', data=df, fit_reg=True, marker='+')
lr = linear_model.LinearRegression().fit(transpose(list(df['MEDHHINC15'])), df['FOODINSEC_13_15'])
print("Coefficient: " + str(lr.coef_[0]))
print("Intercept: " + str(lr.intercept_))